A faster alternative to Pandas `isin` function

Posted by user3576212 on Stack Overflow See other posts from Stack Overflow or by user3576212
Published on 2014-05-30T01:06:16Z Indexed on 2014/05/30 3:26 UTC
Read the original article Hit count: 321

Filed under:

python

|

numpy

|

pandas

I have a very large data frame df that looks like:

ID       Value1    Value2
1345      3.2      332
1355      2.2      32
2346      1.0      11
3456      8.9      322

And I have a list that contains a subset of IDs ID_list. I need to have a subset of df for the ID contained in ID_list.

Currently, I am using df_sub=df[df.ID.isin(ID_list)] to do it. But it takes a lot time. IDs contained in ID_list doesn't have any pattern, so it's not within certain range. (And I need to apply the same operation to many similar dataframes. I was wondering if there is any faster way to do this. Will it help a lot if make ID as the index?

Thanks!

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about numpy

append a numpy array to a numpy array

as seen on Stack Overflow - Search for 'Stack Overflow'
I just started programming in python and am very new to numpy packages.. so still trying to get a hang of it. I have a an numpy_array so something like [ a b c] And then I want to append it into anotehr numpyarray (Just like we create a list of lists) How do we create an array of numpy arrays containing… >>> More
How do you construct an array suitable for numpy sorting?

as seen on Stack Overflow - Search for 'Stack Overflow'
I need to sort two arrays simultaneously, or rather I need to sort one of the arrays and bring the corresponding element of its associated array with it as I sort. That is if the array is [(5, 33), (4, 44), (3, 55)] and I sort by the first axis (labeled below dtype='alpha') then I want: [(3.0, 55… >>> More
Building an interleaved buffer for pyopengl and numpy

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying to batch up a bunch of vertices and texture coords in an interleaved array before sending it to pyOpengl's glInterleavedArrays/glDrawArrays. The only problem is that I'm unable to find a suitably fast enough way to append data into a numpy array. Is there a better way to do this? I… >>> More
Helping install mrcwa and solve problems with f2py in Ubuntu 14.04 LTS

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I am sorry if this is the wrong section but I am starting to get desperate, please someone help me... I need to install the program mrcwa-20080820 (sourceforge.net/projects/mrcwa/) because a summer project that I am involved. I need to use it together with anaconda (store.continuum.io/cshop/anaconda/)… >>> More
Compound assignment operators in Python's Numpy library

as seen on Programmers - Search for 'Programmers'
The "vectorizing" of fancy indexing by Python's numpy library sometimes gives unexpected results. For example: import numpy a = numpy.zeros((1000,4), dtype='uint32') b = numpy.zeros((1000,4), dtype='uint32') i = numpy.random.random_integers(0,999,1000) j = numpy.random.random_integers(0,3,1000) a[i… >>> More